2 Probability Basics

1 Random Variable and Distribution

Random Variable

Given a probability space (Ω,F,P). A random variable(RV) is a function X:ΩR, s.t. Xx{ωΩ|X(ω)x}F for all xR. (we call this F measurable)

Distribution, c.d.f

Given a RV X, its (cumulative) distribution function, c.d.f FX is defined as FX(x)=P(Xx),x(,).

1.1 Discrete RV

A discrete RV means Range(X) is either finite or countably infinite.

Like, for AF, indicator RV IA(ω)={1,ωA,0,ωA is a discrete RV.

It has to satisfy:

1.2 Continuous RV

FX(x) is continuous, xR. Then, we consider probability density function, p.d.f fX(x)=dFX(x)dx, and FX(x)=xfX(y)dy.
More generally, P(XA)=AfX(x)dx.

2 Expectation

Expectation/Mean

For a function g:RR, define expectation E.

  • For discrete RV, E[g(X)]=xRange(X)g(x)P(X=x).
  • For continuous RV, E[g(X)]=g(x)fX(x)dx.

Provided that E[|g(X)|]<. (absolutely convergent.)

By the following theorem, we need absolute convergence to ensure E[g(X)] is well defined.

Theorem (Riemann Rearrangement Theorem)

If i=1an converges but i=1|an| diverges, then for any given r[,], a permutation π: i=1aπ(n)=r.

Variance

Define variance of X: Var(X)=E[(XE[X])]2.

Claim (Linearity of Expectation)

Let X1,,Xn be RVs defined on the same probability space (Ω,F,P) and E[Xi] are well defined. Then for all constants c1,,cn, E[i=1nciXi]=i=1nciE[Xi].

By the claim, we can compute Var(X)=E[X2](E[X])2.

Covariance

Let X,Y be RVs on the same probability space. Then Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y].

Similarly, we can show

Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y).
Theorem (Tail Sum Formula)

Let X be a RV with range {0,1,2,}. Then E[X]=k=1P(Xk).

By definition, this is easy to prove.

3 Conditional Probability

Given that event B happens, what is the probability that A also happens?
We want to consider a new probability space (B,FB,PB). How should we define PB so that it is consistent with P?

For all E1,E2F s.t. E1B,E2B, we want P(E1B)P(E2B)=PB(E1B)PB(E2B)PB=cP.
Since PB(B)=1, we know c=P(B)1.

So for all A,bF s.t. P(B)>0, define conditional probability P(A|B)=PB(AB)=P(AB)P(B).
A,B are independent if P(A|B)=P(A), i.e. P(AB)=P(A)P(B).

Independence of Events

Events E1,,En are independent iff for every k=2,,n and every k subset {i1,,ik}{1,,n}, P(j=1kEij)=j=1kP(Eij).

Independence of RVs

RVs X,Y on the same probability space (Ω,F,P) are said to be independent () iff P[(Xx)(Yy)]=P(Xx)P(Yy),x,yR.
Equivalent condition:

  • For discrete case, P[(X=x)(Y=y)]=P(X=x)P(Y=y).
  • For continuous case, fX,Y(x,y)=fX(x)fY(y).

RVs X1,,Xn on the same probability space (Ω,F,P) are said to be mutually independent iff P[i=1n(Xixi)]=i=1nP(Xixi),x1,,xnR.

Partition

A1,,AnF. BF is a partition if

  • B=A1An.
  • AiAj=,ij.
Theorem (Law of Total Probability)

Suppose B1,,BnF is a partition of Ω, s.t. P(Bi)>0,i. then AF, P(A)=i=1nP(A|Bi)P(Bi).

A similar result applies to a countably infinite partition of Ω.

4 Bayes' Formula

Theorem (Bayes' Formula)

Let B1,,BnF be a partition of Ω. Then AF with P(A)>0, P(Bi|A)=P(BiA)P(A)=P(A|Bi)P(Bi)j=1nP(A|Bj)P(Bj).